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■*— > Abstract 

O 

^^ The Dynamic Time Warping (DTW) is a popular similarity measure 

between time series. The DTW fails to satisfy the triangle inequality and 
C its computation requires quadratic time. Hence, to find closest neigh- 

^_^ bors quickly, we use bounding techniques. We can avoid most DTW 

iiY^ computations with an inexpensive lower bound (LB_Keogh). We com- 

^ir pare LB_Keogh with a tighter lower bound (LBJmproved). We find that 

(— ^ LBJmproved-based search is faster for sequential search. As an example, 

C/2 our approach is 3 times faster over random-walk and shape time series. 

O We also review some of the mathematical properties of the DTW. We 

derive a tight triangle inequality for the DTW. We show that the DTW 
^y-^ becomes the h distance when time series are separated by a constant. 

> 

cn 1 Introduction 

^~^ Dynamic Time Warping (DTW) was initially introduced to recognize spoken 

f~^ words [1], but it has since been applied to a wide range of information re- 

^^ trieval and database problems: handwriting recognition [2, 3], signature recog- 

OO nition [4, 5], image de-interlacing [6], appearance matching for security pur- 

S^ poses [7], whale vocalization classification [8], query by humming [9, 10], clas- 

J> sification of motor activities [11], face localization [12], chromosome classifi- 

cation [13], shape retrieval [14, 15], and so on. Unlike the Euclidean distance, 
DTW optimally aligns or "warps" the data points of two time series (see Fig. 1). 
C^ When the distance between two time series forms a metric, such as the Eu- 

clidean distance or the Hamming distance, several indexing or search techniques 
have been proposed [16, 17, 18, 19, 20]. However, even assuming that we have a 
metric, Weber et al. have shown that the performance of any indexing scheme 
degrades to that of a sequential scan, when there are more than a few dimen- 
sions [21]. Otherwise — when the distance is not a metric or that the number 
of dimensions is too large — we use bounding techniques such as the Generic 
multimedia object indexing (GEMINI) [22]. We quickly discard (most) false 
positives by computing a lower bound. 
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Ratanamahatana and Keogh [23] argue that their lower bound (LB_Keogh) 
cannot be improved upon. To make their point, they report that LB_Keogh 
allows them to prune out over 90% of all DTW computations on several data 
sets. 

We are able to improve upon LB_Kcogh as follows. The first step of our 
two-pass approach is LB_Keogh itself. If this first lower bound is sufficient to 
discard the candidate, then the computation terminates and the next candidate 
is considered. Otherwise, we process the time series a second time to increase 
the lower bound. If this second lower bound is large enough, the candidate is 
pruned, otherwise we compute the full DTW. We show experimentally that the 
two-pass approach can be several times faster. 

The paper is organized as follows. In Section 4, we define the DTW in a 
generic manner as the minimization of the Ip norm (DTWp). In Section 5, we 
present various secondary mathematical results. Among other things, we show 
that if X and y are separated by a constant {x > c > y or x < c < y) then the 
DTWi is the li norm (see Proposition 2). In Section 6, we derive a tight triangle 
inequality for the DTW. In Section 7, we show that DTWi is good choice for 
time-series classification. In Section 8, we compute generic lower bounds on the 
DTW and their approximation errors using warping envelopes. In Section 9, 
we show how to compute the warping envelopes quickly and derive some of 
their mathematical properties. The next two sections introduce LB_Keogh and 
LBJmproved respectively, whereas the last section presents an experimental 
comparison. 

2 Conventions 

Time series are arrays of values measured at certain times. For simplicity, we 
assume a regular sampling rate so that time series are generic arrays of floating- 
point values. A time series x has length |a;|. Time series have length n and 
are indexed from 1 to n. The Ip norm of x is ||a;||p = (^^ ja^il^)"'^ for any 




Figure 1: Dynamic Time Warping example 



integer < p < oo and ||a;||oo = maxj \xi\. The Ip distance between x and y is 
||a; — y\\p and it satisfies the triangle inequality \\x — z\\p < \\x ~ y\\p + \\y — z\\.p 
for 1 < p < oo. Other conventions are summarized in Tabic 1. 



Tabic 1: Frequently used conventions 
|x| or n length 

||a;||p Ip norm 

DTWp monotonic DTW 

NDTWj, non-monotonic DTW 



w DTW locality constraint 

iV(/i, (j) normal distribution 

U{x),L{x) warping envelope (see Section 8) 

H(x,y) projection of a; on y (see Equation 1) 



3 Related Works 

Beside DTW, several similarity metrics have been proposed including the di- 
rected and general Hausdorff distance, Pearson's correlation, nonlinear elastic 
matching distance [24], Edit distance with Real Penalty (ERP) [25], Needleman- 
Wunsch similarity [26], Smith- Waterman similarity [27], and SimilB [28]. 

Dimensionality reduction, such as piecewise constant [29] or piecewise lin- 
ear [30, 31, 32] segmentation, can speed up retrieval under DTW distance. These 
techniques can be coupled with other optimization techniques [33]. 

The performance of lower bounds can be further improved if one uses early 
abandoning [34] to cancel the computation of the lower bound as soon as the 
error is too large. Boundary-based lower-bound functions sometimes outperform 
LBJKeogh [35]. Zhu and Shasha showed that computing a warping envelope 
prior to applying dimensionality reduction results in a tighter lower bound [10]. 
We can also quantize [36] or cluster [37] the time series. 

4 Dynamic Time Warping 

A many-to-many matching between the data points in time series x and the 
data point in time series y matches every data point Xi in x with at least one 
data point yj in y, and every data point in y with at least a data point in x. 
The set of matches {i,j) forms a warping path T. We define the DTW as the 
minimization of the Ip norm of the differences {xi — j/j}(i j)gr over all warping 
paths. A warping path is minimal if there is no subset V of T forming a warping 
path: for simplicity we require all warping paths to be minimal. 

In computing the DTW distance, we commonly require the warping to re- 
main local. For time series x and y, we do not align values Xi and yj if \i—j\ > w 
for some locality constraint w > [1]. When w — 0, the DTW becomes the Ip 



distance whereas when w > n, the DTW has no locality constraint. The value 
of the DTW diminishes monotonically as w increases. 

Other than locality, DTW can be monotonic: if we align value Xi with value 
yj, then we cannot align value Xi^i with a value appearing before yj {yji for 

/<J). 

We note the DTW distance between x and y using the Ip norm as DTWp(a;, y) 
when it is monotonic and as NDTWp(x, y) when monotonicity is not required. 

By dynamic programming, the monotonic DTW requires 0{wn) time. A 
typical value of w is n/10 [23] so that the DTW is in 0{n^). To compute 
the DTW, we use the following recursive formula. Given an array x, we write 
the suffix starting at position i, xu^ — Xi, x^+i, . . . ,Xn. The symbol ® is the 
exclusive or. Write qij = DTWp(x(i),2/(j))P so that DTWp(x, y) — y^TJ, then 

'O if|a;(,)| = |(y(,)| = 

if |a:(,)| = 0®|y(,)| = 



1^,J 



k»-%r- 



™in(9i+ij-,9ij+i,<Zi+ij+i) 



or I « — J I > w 
otherwise. 



Forp = oo, we rewrite the preceding recursive formula with qij — DTWoo(a;(i), J/(j)), 
and qi^j = max(|a;i - yj|,min(gj+ij,grjj+i,(ji+ij+i)) when |x(i)| ^ 0, |y(j)| ^ 0, 
and |« — j| < w. 

We can compute NDTWi without time constraint in 0(n log n) [38]: if the 
values of the time series are already sorted, the computation is in 0{n) time. 

We can express the solution of the DTW problem as an alignment of the 
two initial time series (such as a; = 0, 1, 1, and y = 0, 1, 0, 0) where some of the 
values are repeated (such as x' = 0, 1, 1, 0, and y' = 0, 1, 1, 0, 0). If we allow 
non- monotonicity (NDTW), then values can also be inverted. 

The non-monotonic DTW is no larger than the monotonic DTW which is 
itself no larger than the Ip norm: NDTWp(x,y) < DTWp(x, y) < \\x — y\\p for 
all < p < oo. 

5 Some Properties of Dynamic Time Warping 

The DTW distance can be counterintuitive. As an example, if x,y,z are 
three time series such that x < y < z pointwise, then it docs not follow that 
DTWp(a;,z) > DTWp(;2,y). Indeed, choose x = 7,0,1,0, y = 7,0,5,0, and 
z = 7,7,7,0, then T)TWoc{z,y) — 5 and DTWoo(2:,a;) = 1. Hence, we review 
some of the mathematical properties of the DTW. 

The warping path aligns Xi from time series x and yj from time series y if 
(z, j) G r. The next proposition is a general constraint on warping paths. 

Proposition 1. Consider any two time series x and y. For any minimal warp- 
ing path, if Xi is aligned with yj, then either Xi is aligned only with yj or yj is 
aligned only with Xi. 



Proof. Suppose that the result is not true. Then there is Xk,Xi and yi,yj such 
that Xk and Xi are aUgned with yj, and yi and yj are aligned with Xi. We can 
delete {k,j) from the warping path and still have a warping path. A contradic- 
tion. D 

Hence, we have that the cardinality of the warping path is no larger than 
2n. Indeed, each match (z, j) G T must be such that i or j only occurs in this 
match by the above proposition. 

The next lemma shows that the DTW becomes the Ip distance when cither 
X or y is constant. 

Lemma 1. For any < p < oo, if y ^ c is a constant, then NDTWp{x, y) = 
DTWp{x,y) = \\x-y\\p. 

When p ~ oo, a. stronger result is true: if y — a; + c for some constant c, 
then NDTWoo(a;,y) = DTWoo(x,y) ^ \\x - y\\oo- Indeed, NDTWoo(x,y) > 
I max(y) — max(a;)| = c = \\x — y\\oo > \\x — y||oo which shows the result. This 
same result is not true for p < oo: for x = 0,1,2 and y = 1,2,3, we have 
ll^^ ^ y\\p = "v^ whereas DTWp(x,y) = ^^. However, the DTW is translation 
invariant: DTWp(x, z) ^ DTWp(x + 6, z + b) and NDTWp(a;, z) = NDTWp(x + 
h,z + b) for any scalar b and < p < oo. 

The DTWi has the property that if the time series are value-separated, then 
the DTW is the li norm as the next proposition shows. 

Proposition 2. // x and y are such that either x>c>yorx<c<y for 
some constant c, then DTWi{x, y) — NDTWi{x, y) — \\x — y\\i. 

Proof. Assume x > c > y, there exists x',y' such that x' > c > y' and 
NDTWi(a;,2/) = \\x' - y'\\, = E. \< - v'^l - E. \< - c\ + \c - y[\ = ||x' - 
c||i + l|c — y'lli > \\x — c\\i + \\c — y\\i = \\x — y\\i. Since we also have 
NDTWi(a;,y) < DTWi(a;,y) < \\x - y\\i, the equality follows. D 



Proposition 2 docs not hold for p > I: DTW2((0, 0, 1,0), (2, 3, 2, 2)) = V17 
whereas ||(0, 0,1,0) - (2,3,2,2)||2 = \/T8. 

In classical analysis, we have that n^/^~^/''||x||q > ||a;||p [39] for 1 < p < 
g < oo. A similar results is true for the DTW and it allows us to conclude that 
DTWp(x,y) and NDTWp(a;,y) decrease monotonically as p increases. 

Proposition 3. For I < p < q < oo, we have that {2ny/P^^^'^DTWq{x,y) > 
DTWp{x,y) where \x\ = \y\ = n. The result also holds for the non-monotonic 
DTW. 

Proof. The argument is the same for the monotonic or non-monotonic DTW. 
Given x,y consider the two aligned (and extended) time series x',y' such that 
DTWq{x,y) — \\x' — y'Wq. As a consequence of Proposition 1, we have \x'\ = 
\y'\ < 2n. From classical analysis, we have |a;'|"'"/^~"'"/''||a:;' — y'\\q > \\x' — y'\\p, 
hence \2n\^/P-^/'i\\x' -y'\\q > ||a;'-y'|lp or |2n|i/P-i/«DTW,(x, y) > \\x'-y'\\p. 
Since x' , y' represent a valid warping path of x,y, then j|a;' — y'||p > DTWp(x, y) 
which concludes the proof. D 



6 The Triangle Inequality 

The DTW is commonly used as a similarity measure: x and y are similar if 
DTWp{x,y) is small. Similarity measures often define equivalence relations: 
A ^ A for all A (reflexivity) , A ^ B ^ B ^ C (symmetry) and A ^ B /\ B ^ 
C => A^ C (transitivity) . 

The DTW is reflexive and symmetric, but it is not transitive. Indeed, con- 
sider the following time series: 

X = 0,0,..., 0,0, 

27n+l times 

y^ 0,0,. . ., 0,0 , £, 0,0,. . ., 0,0 , 

m times m times 

Z = 0, 6, e, . . . ,e, e, 0. 

2m— 1 times 

We have that NDTWp(X, Y) = DTWp(X, Y)^\e\, NDTWp(y, Z) = DTWp(y, Z) 
0, NDTWp(X,Z) = DTWp(X,Z) ^ ^(2m- l)|e| for 1 < p < oo and w = 
m — 1. Hence, for e small and n ^ 1/e, we have that X ^Y and F ~ Z, but 
X rfj Z . This example proves the following lemma. 

Lemma 2. For 1 < p < oo and w > 0, neither DTWp nor NDTWp satisfies a 
triangle inequality of the form d{x, y) +d{y, z) > cd{x, z) where c is independent 
of the length of the time series and of the locality constraint. 

This theoretical result is somewhat at odd with practical experience. Casacu- 
berta et al. found no triangle inequality violation in about 15 million triplets 
of voice recordings [40]. To determine whether we could expect violations 
of the triangle inequality in practice, we ran the following experiment. We 
used 3 types of 100-sample time series: white-noise times series defined by 
Xi = N(0, 1) where N is the normal distribution, random-walk time series de- 
fined by Xi = Xi-i + 7V(0, 1) and xi = 0, and the Cylinder-Bell-Funnel time 
series proposed by Saito [41]. For each type, we generated 100 000 triples of 
time series x, y, z and we computed the histogram of the function 



DTWp(a;,y) + DTWp(y,z) 

for p = 1 and p = 2. The DTW is computed without time constraints. Over 
the white-noise and Cylinder-Bell- Funnel time series, we failed to find a single 
violation of the triangle inequality: a triple a;,y, z for which C{x,y,z) > 1. 
However, for the random-walk time series, we found that 20% and 15% of the 
triples violated the triangle inequality for DTWi and DTW2. 

The DTW satisfies a weak triangle inequality as the next theorem shows. 

Theorem 1. Given any 3 same-length time series x,y,z and 1 < p < 00, we 



have 

DTWp{x,z) 



DTWp{x,y) + DTWp{y,z)> 



niin(2u' + 1, n)^^P 



where w is the locality constraint. The result also holds for the non-nionotonic 
DTW. 

Proof. Let F and V be minimal warping paths between x and y and between y 
and z. Let F" = {(i, j, fc)|(i, j) <E F and (j, fc) e F'}. Iterate through the tuples 
{i,j,k) in F" and construct the same-length time series x",y" ,z" from Xi, yj, 
and Zfc. By the locality constraint any match (i,j) G F corresponds to at most 
min(2w + l,n) tuples of the form {i,j,-) G F", and similarly for any match 
(j,fc) G F'. Assume 1 < p < oo. We have that \\x" - y"\\l == Y. 



P Z-t(i,j,k)er" 



X,, 



yj\P < min{2w + l,n)BTWp{x,y)P and \\y" - z"|l^ - E(.,,,fc)6r" 1% " ^k[' < 
min(2w + 1, n)T)TWp{y, z)p. By the triangle inequality in Ip, we have 

min(2ii; + 1, n)i/P(DTWp(x, y) + DTWp(y, z)) > \\x" - y"\\p + \\y" - z"\\p 

> \\x" -z"\\p >DTWp(x,z). 

For p = oo, max(ij^fc)gp// \\xi - yj\\P = DTWoo{x,y)P and max(,j_fc)gr" IVj - 
Zk\^ — DTWoo(2/, ^)^> thus proving the result by the triangle inequahty over l^o- 
The proof is the same for the non-monotonic DTW. D 



The constant min(2ii; + 1, 77,)^/^ is tight. Consider the example with time se- 
ries X, Y, Z presented before Lemma 2. We have DTWp(X, Y) -|-DTWp(r, Z) = 
\e\ and DTWp{X,Z) = ^(2u; + l)|e|. Therefore, we have 

BTWpiX, Y) + DTWp(y, Z) = °TWp(X, Z) 



iin(2w+ l,n)Vp' 

A consequence of this theorem is that DTWoo satisfies the traditional trian- 
gle inequality. 

Corollary 1. The triangle inequality d{x, y)+d(y, z) > d(x, z) holds for DTWoo 
and NDTWoo- 

Hence the DTWoo is a pseudometric: it is a metric over equivalence classes 
defined by a; ^ y if and only if DTWoo (a;, y) — 0. When no locality constraint 
is enforced, DTWoo is equivalent to the discrete Frchet distance [42]. 

7 Which is the Best Distance Measure? 

The DTW can be seen as the minimization of the Ip distance under warping. 
Which p should we choose? Legrand et al. reported best results for chromosome 
classification using DTWi [13] as opposed to using DTW2. However, they did 
not quantify the benefits of DTWi. Morse and Patel reported similar results 
with both DTWi and DTW2 [43]. 
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While they do not consider the DTW, Aggarwal et al. [44] argue that out of 
the usual Ip norms, only the h norm, and to a lesser extend the I2 norm, express 
a qualitatively meaningful distance when there are numerous dimensions. They 
even report on classification-accuracy experiments where fractional Ip distances 
such as lo,i and I0.5 fare better. Frangois et al. [45] made the theoretical result 
more precise showing that under uniformity assumptions, lesser values of p are 
always better. 

To compare DTWi, DTW2, DTW4 and DTWoo, we considered four different 
synthetic time-series data sets: Cylinder-Bell- Funnel [41], Control Charts [46], 
Waveform [47], and Wave-|- Noise [48]. The time series in each data sets have 
lengths 128, 60, 21, and 40. The Control Charts data set has 6 classes of time 
series whereas the other 3 data sets have 3 classes each. For each data set, we 
generated various databases having a different number of instances per class: 
between 1 and 9 inclusively for Cylinder-Bell-Funnel and Control Charts, and 
between 1 and 99 for Waveform and Wave-|-Noise. For a given data set and 
a given number of instances, 50 different databases were generated. For each 
database, we generated 500 new instances chosen from a random class and we 
found a nearest neighbor in the database using DTWp for p = 1,2, 4, 00 and 
using a time constraint of w = n/10. When the instance is of the same class as 
the nearest neighbor, we considered that the classification was a success. 

The average classification accuracies for the 4 data sets, and for various 
number of instances per class is given in Fig. 2. The average is taken over 
25 000 classification tests (50 x 500), over 50 different databases. 

Only when there are one or two instances of each class is DTWoo competitive. 
Otherwise, the accuracy of the DTWoo-based classification does not improve as 
we add more instances of each class. For the Waveform data set, DTWi and 
DTW2 have comparable accuracies. For the other 3 data sets, DTWi has a 
better nearest-neighbor classification accuracy than DTW2. Classification with 
DTW4 has almost always a lower accuracy than either DTWi or DTW2. 

Based on these results, DTWi is a good choice to classify time series whereas 
DTW2 is a close second. 



8 Computing Lower Bounds on the DTW 

Given a time series x, define U{x)i = maxfc{a;fc| |fc — i| < w} and L{x)i = 
minfe{a:;fc| |fc — z| < w} for i = l,...,n. The pair U{x) and L{x) forms the 
warping envelope of x (sec Fig. 3). We leave the time constraint w implicit. 

The theorem of this section has an elementary proof requiring only the fol- 
lowing technical lemma. 

Lemma 3. If b e [a, c] then (c - a)^ > (c - hY + {b - a)P for 1 < p < 00. 

Proof. For p — 1, (c — b)P -I- (6 — a)^ = (c — a)^. For p > 1, by deriving 
(c — by + {b — a)P with respect to b, we can show that it is minimized when 
6 = (c + a)/2 and maximized when b G {a, c}. The maximal value is (c — a)^. 
Hence the result. D 





(a) Cylinder-Bell-Funnel 



(b) Control Charts 
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Figure 2: Classification accuracy versus the number of instances of each class 
in four data sets 
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Figure 3: Warping envelope example 



The following theorem introduces a generic result that we use to derive 
two lower bounds for the DTW including the original Keogh-Ratanamahatana 
result [29]. 

Theorem 2. Given two equal-length time series x and y and I < p < oo, then 
for any time series h satisfying Xi > hi > U{y)i or Xi < hi < L{y)i or hi = Xi 
for all indexes i, we have 

DTWp{x,y)P > NDTWp{x,yr 

>\\x-h\\P + NDTWp{h,y)P. 

For p ~ oo, a similar result is true: DTWooi^x^ y) > NDTWca{x, y) > niax(||x — 
h\\oo,NDTW^{h,y)). 

Proof. Suppose that 1 < p < oo. Let F be a warping path such that NDTWp(x, y)P 
S(i i)er \^i~yj\p- ^y t^^ constraint on h and Lemma 3, we have that \xi—yj\P > 
\xi -~ ht\P + |/ij - yj\P for any {i,j) e F since hi e [mlii{xi, yj),Taa,x{xt, y^)]. 
Hence, we have that NDTWp(a;,y)P > E(ij)er \^i ^ '^'^\^ + \'^'^ ~ Vjl''' > W^ ^ 
Hp + Y.(ij}er \^i ~ %l^- This proves the result since Y.{i,j)er l^i ^ %l ^ 
NDTWp(/i, y). For p = oo, wc have that NDTWoo(a;, y) = max(i^j)gr \xi — yj\ < 
max(ij)grmax(|xj-/ii|,|/ij-2/j|) = max(||a;-/i||oo,NDTWoo(/i,y)), concluding 
the proof. D 

While Theorem 2 defines a lower bound (||x — h\\p), the next proposition 
shows that this lower bound must be a tight approximation as long as h is close 
to y in the Ip norm. 

Proposition 4. Given two equal-length time series x and y, and 1 < p < oo 
with h as in Theorem 2, we have that \\x — h\\p approximates both DTWp{x,y) 
and NDTWp{x,y) within \\h — y\\p. 

Proof. By the triangle inequality over ^p, we have ||a; — /i||p + ||/i — y||p > ||x — 2/||p. 
Since \\x - y\\p > DTWp(a;,j/), we have ||x — h\\p + \\h - y\\p > DTWp(a;,y), 
and hence \\h — y\\p > DTWp{x,y) — \\x — h\\p. This proves the result since by 
Theorem 2, we have that DTWp(x, y) > NDTWp(a;, y) > \\x - h\\p. D 

This bound on the approximation error is reasonably tight. If x and y are 
separated by a constant, then DTWi(x,y) = \\x — y\\i by Proposition 2 and 

lk-y||i = J2i \^i-yt\ = J2i \xi-hi\ + \hi-yi\ = \\x-h\\i + \\h-y\\i. Hence, 
the approximation error is exactly ||/i — j/||i in such instances. 

9 Warping Envelopes 

The computation of the warping envelope U{x),L(x) requires 0{nw) time using 
the naive approach of repeatedly computing the maximum and the minimum 
over windows. Instead, we compute the envelope using at most 3n comparisons 
between data-point values [49] using Algorithm 1. 
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Algorithm 1 Streaming algorithm to compute the warping envelope using no 
more than 3n comparisons 

input a time series a indexed from 1 to n 

input some DTW time constraint w 

return warping envelope U, L (two time series of length n) 

u, I ^- empty double-ended queues, we append to "back" 

append 1 to u and I 

for i in {2, . . . ,n} do 
if i > w + 1 then 

Ui — u) "^ ^front(u)5 i-^i — w ^ ^irQnt(l) 

if tti > tti-i then 
pop u from back 
while ai > aback(n) do 
pop u from back 
else 

pop I from back 
while tti < aback(0 do 
pop I from back 
append i to u and I 
if i = 2w + 1 + front (m) then 

pop u from front 
else if i = 2w + 1 + front(/) then 
pop I from front 
for i in {n + 1, . . . ,n + w} do 

Ui — w ^ ^front{u)i i^i — w ^ '^front(Z) 

if i-front(u)> 2w + 1 then 

pop u from front 
if i-front(0> 2w + 1 then 

pop I from front 
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Envelopes are asymmetric in the sense that if x is in the envelope of y {L{y) < 
X < U{x)), it does not follow that x is in the envelope of y (L(x) < y < U{x)). 
For example, x = 0,0, ... ,0 is in the envelope of y = 1, —1, 1, —1, 1, —1, . . . , 1 
for w > I, but the reverse is not true. However, the next lemma shows that if x 
is below or above the envelope of y, then y is above or below the envelope of x. 

Lemma 4. L{x) > y is equivalent to x > U{y). 

Proof. Suppose Xi < U{y)i for some i, then there is j such that \i ~ i\ < w and 
Xi < yj, therefore L{x)j < yj. It follows that L(x) > y implies x > U{y). The 
reverse implication follows similarly. D 

Wc know that L{h) is less or equal to h whereas U{h) is greater or equal to 
h. The next lemma shows that U{L{h)) is less or equal than h whereas L{U{h)) 
is greater or equal than h. 

Lemma 5. We have U{L{h)) < h and L{U{h)) > h for any h. 

Proof. By definition, wc have that L{h)j < hi whenever |i — j| < w. Hence, 
maxj||j_j|<^ L(/i)j < /li which proves [/(i(/i)) < h. The second result {L{U{h)) > 
h) follows similarly. D 

Whereas L{U{h)) is greater or equal than h, the next lemma shows that 
U{L{U{h))) is equal to U{h). 

Corollary 2. We have U{h) = U{L{U{h))) and L{h) = L{U{L{h))) for any h. 

Proof. By Lemma 5, wc have L{U{h)) > h, hence U{L{U{h))) > U{h). Again 
by Lemma 5, we have U{L{h')) < h' for h' = U{h) or U{L{U{h))) < U{h). 
Hence, U{h) ^ U{L{U{h))). The next result {L{h) = L{U{L{h)))) follows 
similarly. D 

10 LB_Keogh 

Let -ff (x, y) be the projection of x on y defined as 

fC/(y). ifx, >[/(y), 
H{x,y)i^ lL{y)i li x^ < L{y)i (1) 

yxi otherwise, 

for i — 1,2, ... ,n. We have that H(x, y) is in the envelope of y. By Theorem 2 
and setting h ^ H{x,y), wc have that NDTWp(a;,y)P > \\x - H{x,y)\\P + 
NDTW{H{x,y),yY for 1 < p < oo. Write LB_Kcoghp(a;, y) = ||a; - H{x,y)\\p 
(see Fig. 4). The following corollary follows from Theorem 2 and Proposition 4. 

Corollary 3. Given two equal-length time series x and y and 1 < p < oo, then 

• LB_Keogh {x,y) is a lower hound to the DTW: 

DTWp{x,y) > NDTWp{x,y) > LB_Keogh^{x,y); 
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Figure 4: LB_Keogh example: the area of the marked region is LB_Keogh]^(a;, y) 



• the accuracy of LB_Keogh is bounded by the distance to the envelope: 
DTWp{x,y) - LB_Keoghp{x,y) < || max{[/(y)i -yi,yi - L{y)i}i\\p 
for all X. 

Algorithm 2 shows how LB_Keogh can be used to find a nearest neighbor in a 
time series database. The computation of the envelope of the query time series is 
done once (see line 4). The lower bound is computed in lines 7 to 12. If the lower 
bound is sufficiently large, the DTW is not computed (see line 13). Ignoring 
the computation of the full DTW, at most (27V + 3)n comparisons between data 
points are required to process a database containing N time series. 



11 LB .Improved 

Write LBJmproved (x, y)'' = LB_Keogh (x, y)^ + LB_Keogh (y, iJ(a;, y))P for 
1 < p < oo. By definition, we have LBJmprovedp(a;,2/) > LB_Keogh (a;,y). 
Intuitively, whereas LB_Keogh (x, y) measures the distance between x and the 
envelope of y, LB_Keogh (y,iy(a;,y)) measures the distance between y and the 
envelope of the projection of x on y (see Fig. 5). The next corollary shows that 
LB Jmproved is a lower bound to the DTW. 

Corollary 4. Given two equal-length time series x and y and \ < p < oo, then 
LB_Improvedp{x, y) is a lower bound to the DTW: DTWp{x, y) > NDTWp{x, y) > 
LBJmprovedp{x, y) . 

Proof. Recall that LB_Keoghp(a;,y) = \\x — H{x,y)\\p. First apply Theorem 2: 
DTWp(x,y)P > NDTWp(x,y)P > LB_Keoghp(x, y)^ + NDTWp(ff(x, y),y)P. 
Apply Theorem 2 once more: NDTWp(y, iJ(a:,y))P > LBJ<:eoghp(y, iJ(a;, y))^. 
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Algorithm 2 LB_Keogh-based Nearest-Neighbor algorithm 



1 


input a time series a indexed from 1 to n 


2 


input a set S of candidate time series 


3 


return the nearest neighbor i? to a in S under DTWi 


4 


f/, 1/ <— envelope(a) 


5 


b *— oo 


6 


for candidate c in 5 do 


7 


/3^0 


8 


for i G {1,2, ...,n} do 


9 


if Ci > Ui then 


10 


/3 ^ /? + c, - [/, 


11 


else if Ci < Li then 


12 


(5^ f3 + U-c, 


13 


if /3 <b then 


14 


f «- DTWi(a,c) 


15 


if t < fe then 


16 


6^t 


17 


B^c 




100 120 



Figure 5: LBJmproved example: the area of the marked region is 
LB Jmproved^ (x, y) 
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By substitution, we get 'DTWp{x,y)P > N'DTWp{x,y)P > LB _Kcog\{x , y)P + 
LBJicoghJy, H{x,y))P thus proving the result. D 

Algo. 3 shows how to apply LBJmproved as a two-step process. Initially, 
for each candidate c, we compute the lower bound LBJKeogh]^(c, a) (see lines 8 
to 15). If this lower bound is sufficiently large, the candidate is discarded (see 
line 16), otherwise we add LB_Keogh]^(a, 7J(c, a)) to LB_Keogh]^(c, a), in effect 
computing LBJmproved]^(c, a) (see lines 17 to 22). If this larger lower bound is 
sufficiently large, the candidate is finally discarded (see line 23). Otherwise, we 
compute the full DTW. If a is the fraction of candidates pruned by LBJKeogh, 
at most (27V + 3)n + 5(1 — a)Nn comparisons between data points are required 
to process a database containing N time series. 

Algorithm 3 LBJmproved-based Nearest-Neighbor algorithm 



1: 

2: 

3: 

4: 

5: 

6: 

7: 

8: 

9: 

10: 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20: 

21 

22 

23 

24: 

25 

26 

27: 



input a time series a indexed from 1 to n 
input a set S of candidate time series 
return the nearest neighbor _B to a in S under DTWi 
U,L ^- envelope(a) 
& ^ oo 

for candidate c in 5* do 
copy c to c 

for i G {1, 2, . . . ,n} do 
if Ci > Ui then 

c'i = Ui 
else if Ci < Li then 

(5^(3 + L,-c, 
Ci = Li 
if 13 <b then 

[/', L' <— envolope(c') 
for i € {1, 2, . . . ,n} do 
if tti > Ui then 

P^f3 + ai-m 
else if tti < L'i then 

/3 ^ /3 + L; - tti 
if P <b then 
t^DTWi(a,c) 
if i < 6 then 

B ^c 



12 Comparing LB Keogh and LB Improved 

In this section, we benchmark Algorithms 2 and 3. We know that the LBJmproved 
approach has at least the pruning power of the LB_Keogh-based approach, but 
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Figure 6: Nearest-Neighbor Retrieval for the Cyhnder-Bell-Funnel data set 



does more pruning translate into a faster nearest-neighbor retrieval under the 
DTW distance? 

We implemented the algorithms in C++ and called the functions from 
Python scripts. We used the GNU GCC 4.0.2 compiler on an Apple Mac Pro, 
having two Intel Xeon dual-core processors running at 2.66 GHz with 2GiB of 
RAM. All data was loaded in memory before the experiments, and no thrashing 
was observed. We measured the wall clock total time. In all experiments, we 
benchmark nearest-neighbor retrieval under the DTWi with the locality con- 
straint w set at 10% {w — n/10). To ensure reproducibility, our source code is 
freely available [50], including the script used to generate synthetic data sets. 
We compute the full DTW using a straight-forward 0(n^)-time dynamic pro- 
gramming algorithm. 

12.1 Synthetic data sets 

We tested our algorithms using the Cylinder-Bell- Funnel [41] and Control Charts [46] 
data sets, as well as over a database of random walks. We generated 1 000- 
sample random-walk time series using the formula Xi — Xi-i + N(Q, 1) and 
xi — 0. Results for the Waveform and Wave+Noise data sets are similar and 
omitted. 

For each data set, we generated a database of 10 000 time series by adding 
randomly chosen items. The order of the candidates is thus random. Fig. 6, 
7 and 8 show the average timings and pruning ratio averaged over 20 queries 
based on randomly chosen time series as we consider larger and large fraction 
of the database. LB Jmproved prunes between 2 and 4 times more candidates 
and it is faster by a factor between 1.5 and 3. 

12.2 Shape data sets 

For the rest of the section, we considered a large collection of time-series derived 
from shapes [37, 51]. The first data set is made of heterogeneous shapes which 
resulted in 5 844 1 024-sample times series. The second data set is an arrow- 
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Figure 7: Nearest-Neighbor Retrieval for the Control Charts data set 
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Figure 8: Nearest-Neighbor Retrieval for the random-walk data set 
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Figure 9: Nearest-Neighbor Retrieval for the heterogeneous shape data set 
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Figure 10: Nearest-Neighbor Retrieval for the arrow-head shape data set 



head data set with of 15 000 251-saniple time series. We shuffled randomly each 
data set so that candidates appear in random order. We extracted 50 time series 
from each data set, and we present the average nearest-neighbor retrieval times 
and pruning power as we consider various fractions of each database (see Fig. 9 
and 10). The results are similar: LBJmproved has twice the pruning power and 
is faster by a factor of 3. 

13 Conclusion 

We have shown that a two-pass pruning technique can improve the retrieval 
speed by up to three times in several time-series databases. We do not use more 
memory. 

We expect to be able to significantly accelerate the retrieval with paral- 
lelization. Several instances of Algo. 3 can run in parallel as long as they can 
communicate the distance between the time series and the best candidate. 
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